Goto

Collaborating Authors

 input dimension


Kernel Identification Through Transformers

Neural Information Processing Systems

Kernel selection plays a central role in determining the performance of Gaussian Process (GP) models, as the chosen kernel determines both the inductive biases and prior support of functions under the GP prior. This work addresses the challenge of constructing custom kernel functions for high-dimensional GP regression models. Drawing inspiration from recent progress in deep learning, we introduce a novel approach named KITT: Kernel Identification Through Transformers. KITT exploits a transformer-based architecture to generate kernel recommendations in under 0.1 seconds, which is several orders of magnitude faster than conventional kernel search algorithms. We train our model using synthetic data generated from priors over a vocabulary of known kernels. By exploiting the nature of the selfattention mechanism, KITT is able to process datasets with inputs of arbitrary dimension. We demonstrate that kernels chosen by KITT yield strong performance over a diverse collection of regression benchmarks.



DevFly: Bio-inspired Development of Binary Connections for Locality Preserving Sparse Codes

Neural Information Processing Systems

Neural circuits undergo developmental processes which can be influenced by experience. Here we explore a bio-inspired development process to form the connections in a network used for locality sensitive hashing. The network is a simplified model of the insect mushroom body, which has sparse connections from the input layer to a second layer of higher dimension, forming a sparse code. In previous versions of this model, connectivity between the layers is random. We investigate whether the performance of the hash, evaluated in nearest neighbour query tasks, can be improved by process of developing the connections, in which the strongest input dimensions in successive samples are wired to each successive coding dimension. Experiments show that the accuracy of searching for nearest neighbours is improved, although performance is dependent on the parameter values and datasets used. Our approach is also much faster than alternative methods that have been proposed for training the connections in this model. Importantly, the development process does not impact connections built at an earlier stage, which should provide stable coding results for simultaneous learning in a downstream network.




Appendix A Details

Neural Information Processing Systems

More details on each of these datasets are given below. This data is referred to as "in-domain" because the validation data is generated using the same As for cache hits, they are also not counted as visits. Figure 9: MCTS-Guided decoding algorithm for Symbolic Regression with the pre-trained transformer model used for expansion and evaluation steps. MCTS algorithm (Figure 1) which can be used in a similar fashion but without sharing information with the pre-trained transformer. The approach involves fine-tuning an actor-critic-like model to adjust the pre-trained model on a group of symbolic regression instances.